7 resultados para 060102 Bioinformatics

em Brock University, Canada


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Bioinformatics applies computers to problems in molecular biology. Previous research has not addressed edit metric decoders. Decoders for quaternary edit metric codes are finding use in bioinformatics problems with applications to DNA. By using side effect machines we hope to be able to provide efficient decoding algorithms for this open problem. Two ideas for decoding algorithms are presented and examined. Both decoders use Side Effect Machines(SEMs) which are generalizations of finite state automata. Single Classifier Machines(SCMs) use a single side effect machine to classify all words within a code. Locking Side Effect Machines(LSEMs) use multiple side effect machines to create a tree structure of subclassification. The goal is to examine these techniques and provide new decoders for existing codes. Presented are ideas for best practices for the creation of these two types of new edit metric decoders.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Understanding the machinery of gene regulation to control gene expression has been one of the main focuses of bioinformaticians for years. We use a multi-objective genetic algorithm to evolve a specialized version of side effect machines for degenerate motif discovery. We compare some suggested objectives for the motifs they find, test different multi-objective scoring schemes and probabilistic models for the background sequence models and report our results on a synthetic dataset and some biological benchmarking suites. We conclude with a comparison of our algorithm with some widely used motif discovery algorithms in the literature and suggest future directions for research in this area.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Scientists have been debating for decades the origin of life on earth. A number of hypotheses were proposed as to what emerged first RNA or DNA; with most scientists are in favour of the "RNA World" hypothesis. Assuming RNA emerged first, it fellow that the RNA polymerases would've appeared before DNA polymerases. Using recombinant DNA technology and bioinformatics we undertook this study to explore the relationship between RNA polymerases, reverse transcriptase and DNA polymerases. The working hypothesis is that DNA polymerases evolved from reverse transcriptase and the latter evolved from RNA polymerases. If this hypothesis is correct then one would expect to find various ancient DNA polymerases with varying level of reverse transcriptase activity. In the first phase of this research project multiple sequence alignments were made on the protein sequence of 32 prokaryotic DNA-directed DNA polymerases originating from 11 prokaryotic families against 3 viral reverse transcriptase. The data from such alignments was not very conclusive. DNA polymerases with higher level of reverse transcriptase activity were non-confined to ancient organisms, as one would've expected. The second phase of this project was focused on conditions that may alter the DNA polymerase activity. Various reaction conditions, such as temperature, using various ions (Ni2+, Mn2+, Mg2+) were tested. Interestingly, it was found that the DNA polymerase from the Thermos aquatics family can be made to copy RNA into DNA (i.e. reverse transcriptase activity). Thus it was shown that under appropriate conditions (ions and reactions temperatures) reverse transcriptase activity can be induced in DNA polymerase. In the third phase of this study recombinant DNA technology was used to generate a chimeric DNA polymerase; in attempts to identify the region(s) of the polymerase responsible for RNA-directed DNA polymerase activity. The two DNA polymerases employed were the Thermus aquatic us and Thermus thermophiles. As in the second phase various reaction conditions were investigated. Data indicated that the newly engineered chimeric DNA polymerase can be induced to copy RNA into DNA. Thus the intrinsic reverse transcriptase activity found in ancient DNA polymerases was localized into a domain and can be induced via appropriate reaction conditions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genome sequence varies in numerous ways among individuals although the gross architecture is fixed for all humans. Retrotransposons create one of the most abundant structural variants in the human genome and are divided in many families, with certain members in some families, e.g., L1, Alu, SVA, and HERV-K, remaining active for transposition. Along with other types of genomic variants, retrotransponson-derived variants contribute to the whole spectrum of genome variants in humans. With the advancement of sequencing techniques, many human genomes are being sequenced at the individual level, fueling the comparative research on these variants among individuals. In this thesis, the evolution and functional impact of structural variations is examined primarily focusing on retrotransposons in the context of human evolution. The thesis comprises of three different studies on the topics that are presented in three data chapters. First, the recent evolution of all human specific AluYb members, representing the second most active subfamily of Alus, was tracked to identify their source/master copy using a novel approach. All human-specific AluYb elements from the reference genome were extracted, aligned with one another to construct clusters of similar copies and each cluster was analyzed to generate the evolutionary relationship between the members of the cluster. The approach resulted in identification of one major driver copy of all human specific Yb8 and the source copy of the Yb9 lineage. Three new subfamilies within the AluYb family – Yb8a1, Yb10 and Yb11 were also identified, with Yb11 being the youngest and most polymorphic. Second, an attempt to construct a relation between transposable elements (TEs) and tandem repeats (TRs) was made at a genome-wide scale for the first time. Upon sequence comparison, positional cross-checking and other relevant analyses, it was observed that over 20% of all TRs are derived from TEs. This result established the first connection between these two types of repetitive elements, and extends our appreciation for the impact of TEs on genomes. Furthermore, only 6% of these TE-derived TRs follow the already postulated initiation and expansion mechanisms, suggesting that the others are likely to follow a yet-unidentified mechanism. Third, by taking a combination of multiple computational approaches involving all types of genetic variations published so far including transposable elements, the first whole genome sequence of the most recent common ancestor of all modern human populations that diverged into different populations around 125,000-100,000 years ago was constructed. The study shows that the current reference genome sequence is 8.89 million base pairs larger than our common ancestor’s genome, contributed by a whole spectrum of genetic mechanisms. The use of this ancestral reference genome to facilitate the analysis of personal genomes was demonstrated using an example genome and more insightful recent evolutionary analyses involving the Neanderthal genome. The three data chapters presented in this thesis conclude that the tandem repeats and transposable elements are not two entirely distinctly isolated elements as over 20% TRs are actually derived from TEs. Certain subfamilies of TEs themselves are still evolving with the generation of newer subfamilies. The evolutionary analyses of all TEs along with other genomic variants helped to construct the genome sequence of the most recent common ancestor to all modern human populations which provides a better alternative to human reference genome and can be a useful resource for the study of personal genomics, population genetics, human and primate evolution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

DNA assembly is among the most fundamental and difficult problems in bioinformatics. Near optimal assembly solutions are available for bacterial and small genomes, however assembling large and complex genomes especially the human genome using Next-Generation-Sequencing (NGS) technologies is shown to be very difficult because of the highly repetitive and complex nature of the human genome, short read lengths, uneven data coverage and tools that are not specifically built for human genomes. Moreover, many algorithms are not even scalable to human genome datasets containing hundreds of millions of short reads. The DNA assembly problem is usually divided into several subproblems including DNA data error detection and correction, contig creation, scaffolding and contigs orientation; each can be seen as a distinct research area. This thesis specifically focuses on creating contigs from the short reads and combining them with outputs from other tools in order to obtain better results. Three different assemblers including SOAPdenovo [Li09], Velvet [ZB08] and Meraculous [CHS+11] are selected for comparative purposes in this thesis. Obtained results show that this thesis’ work produces comparable results to other assemblers and combining our contigs to outputs from other tools, produces the best results outperforming all other investigated assemblers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ordered gene problems are a very common classification of optimization problems. Because of their popularity countless algorithms have been developed in an attempt to find high quality solutions to the problems. It is also common to see many different types of problems reduced to ordered gene style problems as there are many popular heuristics and metaheuristics for them due to their popularity. Multiple ordered gene problems are studied, namely, the travelling salesman problem, bin packing problem, and graph colouring problem. In addition, two bioinformatics problems not traditionally seen as ordered gene problems are studied: DNA error correction and DNA fragment assembly. These problems are studied with multiple variations and combinations of heuristics and metaheuristics with two distinct types or representations. The majority of the algorithms are built around the Recentering- Restarting Genetic Algorithm. The algorithm variations were successful on all problems studied, and particularly for the two bioinformatics problems. For DNA Error Correction multiple cases were found with 100% of the codes being corrected. The algorithm variations were also able to beat all other state-of-the-art DNA Fragment Assemblers on 13 out of 16 benchmark problem instances.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Madagascar periwinkle [Catharanthus roseus (L.) G. Don] is a commercially important horticultural flower species and is the only source for several pharmaceutically valuable monoterpenoid indole alkaloids (MIAs), including the powerful antihypertensive ajmalicine and the antineoplastic agents vincristine and vinblastine. While biosynthesis of MIA precursors has been elucidated, conversion of the common MIA precursor strictosidine to MIAs of different families, for example ajmalicine, catharanthine or vindoline, remains uncharacterized. Deglycosylation of strictosidine by the key enzyme Strictosidine beta-glucosidase (SGD) leads to a pool of uncharacterized reaction products that are diverted into the different MIA families, but the downstream reactions are uncharacterized. Screening of 3600 EMS (ethyl methane sulfonate) mutagenized C. roseus plants to identify mutants with altered MIA profiles yielded one plant with high ajmalicine, and low catharanthine and vindoline content. RNA sequencing and comparative bioinformatics of mutant and wildtype plants showed up-regulation of SGD and the transcriptional repressor Zinc finger Catharanthus transcription factor (ZCT1) in the mutant line. The increased SGD activity in mutants seems to yield a larger pool of uncharacterized SGD reaction products that are channeled away from catharanthine and vindoline towards biosynthesis of ajmalicine when compared to the wildtype. Further bioinformatic analyses, and crossings between mutant and wildtype suggest a transcription factor upstream of SGD and ZCT1 to be mutated, leading to up-regulation of Sgd and Zct1. The crossing experiments further show that biosynthesis of the different MIA families is differentially regulated and highly complex. Three new transcription factors were identified by bioinformatics that seem to be involved in the regulation of Zct1 and Sgd expression, leading to the high ajmalicine phenotype. Increased cathenamine reductase activity in the mutant converts the pool of SGD reaction products into ajmalicine and its stereoisomer tetrahydroalstonine. The stereochemistry of ajmalicine and tetrahydroalstonine biosynthesis in vivo and in vitro was further characterized. In addition, a new clade of perakine reductase-like enzymes was identified that reduces the SGD reaction product vallesiachotamine in a stereo-specific manner, characterizing one of the many reactions immediately downstream of SGD that determine the different MIA families. This study establishes that RNA sequencing and comparative bioinformatics, in combination with molecular and biochemical characterization, are valuable tools to determine the genetic basis for mutations that trigger phenotypes, and this approach can also be used for identification of new enzymes and transcription factors.